On the relation between native geometry and conformational plasticity 
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In protein folding the term plasticity refers to the number of alternative folding pathways encoun- 
tered in response to free energy perturbations such as those induced by mutation. Here we explore 
the relation between folding plasticity and a gross, generic feature of the native geometry, namely, 
the relative number of local and non-local native contacts. The results from our study, which is 
based on Monte Carlo simulations of simple lattice proteins, show that folding to a structure that 
is rich in local contacts is considerably more plastic than folding to a native geometry characterized 
by having a very large number of long-range contacts (i.e., contacts between amino acids that are 
separated by more than 12 units of backbone distance). The smaller folding plasticity of 'non-local' 
native geometries is probably a direct consequence of their higher folding cooperativity that renders 
the folding reaction more robust against single- and multiple-point mutations. 

PACS numbers: 87.15. Cc; 87.15.ak; 87.15. hm; 87.15. hp 

Keywords: mutagenesis, <)?)-value analysis, folding pathways, long-range contacts 



PQ 
d 

I 



> 
o 

00 



0\ 

o 

00 

o 



I. INTRODUCTION 

During the last 15 years significant progress has been 
achieved towards a complete understanding of the kinet- 
ics and mechanisms of protein folding. The synergistic 
link between computer simulations and in vitro exper- 
iments has proven particularly fruitful in this endeav- 
our [l|, 0. Much of our current knowledge on protein 
folding has been gathered by studying small (i.e., with 
less than 100 amino acids) monomeric proteins, epito- 
mized by the likes of the 64-residue protein Chymotrypsin 
Inhibitor 2 (CI2). Indeed, their two-state folding kinet- 
ics renders them particularly suitable models to in- 
vestigate, both in vitro and in silico, the rather complex 
phenomenon that is the folding 'reaction'. 

A major and challenging task in studying two-state 
proteins is the structural characterization of the folding 
transition state (TS), located on the top of the free energy 
barrier that separates the native fold from the ensemble 
of unfolded conformations. Indeed, due to its fleeting 
nature, the commonly available biophysical tools have 
revealed inappropriate to probe the TS' structure. Thus, 
experimental studies of the TS have remained predom- 
inantly rooted in the use of a particular class of pro- 
tein engineering methods, the so-called value analy- 
sis, developed by Fersht and coworkers back in the late 
1980s (sf. In the i^- value analysis a non-disruptive mu- 
tation (i.e., a mutation that does not change the struc- 
ture of the native state, and does not alter the folding 
pathway either) is made at some position in the protein 
sequence [l2|. The change in the activation energy of 
folding of the mutant with respect to that of the wild- 
type (WT) protein, denoted by AAG^^^^ , is measured 
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together with the change in the free energy of folding, 
AAG^~^ , caused by the mutation. The corresponding 
(f> value is then defined as the ratio between these quan- 
tities, namely, (f> = AAG^^-° / AAG^~° . A value of 
unity means that the energy of the TS is perturbed upon 
mutation exactly as much as that of the native state, 
which has "traditionally been interpreted as if the pro- 
tein structure is folded at the site of mutation in the TS. 
Conversely, residues that are unfolded in the TS exhibit 
0- values of zero. 

The traditional interpretation of fractional 4> values is, 
however, not straightforward as they might indicate the 
existence of multiple folding pathways or a unique TS 
with genuinely weakened interactions 0]. Moreover, the 
so-called nonclassical 0-values (0 > 1 or < ) are 
difficult to interpret in the traditional 0-value model. 
Recently, Weikl and co-workers have proposed a new 
model that is able to capture and interpret nonclassi- 
cal (/)- values 0, [l3| ■ The model assumes that cooperative 
structural elements (e.g., and a-helix or a /3-hairpin) are 
fully formed in the TS, and that a mutation on a sin- 
gle residue affects the whole structural element where 
the native contacts established by the mutated residue 
arc located. Likewise, the activation energy of folding 
in the definition of the 0-values contain explicit free en- 
ergy contributions from different substructural elements 
of the protein. This model was recently applied to study 
two small /3-sheet domains, namely the FBP and PIN 
WW domains, which were assumed to fold via two dis- 
tinct folding pathways (and transition states), and it was 
able to reproduce 0- values in good agreement with those 
reported experimentally [ll[. 

The concept of conformational/folding plasticity p^ . 
or folding malleability [l3|, refers to the number of al- 
ternative folding pathways encountered in response to 
free energy perturbations such as those induced by sol- 
vent changes or mutation; the larger that number, the 
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broader the TS, and the larger the plasticity. Although 
it is experimentally very difficult to investigate in a di- 
rect manner the number of folding pathways leading to 
the native state, evidence for the existence of conforma- 
tional plasticity was found in a few cases by means of pro- 
tein engineering experiments. For example, single point 
mutations in the N-terminal domain of the monomcric 
Ag-ss-reprcssor revealed a broad TS [1^. The use of 
circular permutation methods - a less milder protein en- 
gineering technique where the protein's backbone is cut- 
up at some point, and its original N- and C-terminal 
parts are linked together in a process that leads to min- 
imal alteration of the native fold - revealed considerable 
structural changes in the TSs of both the a-spectrin SH3 
domain [l5l | and src SH3 domain and even more 

dramatic modifications in the TS of the ribosomal pro- 
tein S6 (l7j , a finding that was corroborated by computer 
simulation studies [H, [l^, . The plasticity exhibited 
by these proteins has been taken has evidence for the 
existence of multiple folding pathways [l^ . 

In a recent study, Klimov and Thirumalai have studied 
the influence of the distribution of secondary structural 
elements along the protein sequence using an off-lattice 
coarse grained model and Langevin Molecular Dynamics 
simulations. The major result that came out of their 
study is that a symmetric distribution of a helices and (3 
sheets with respect to sequence midpoint favours multiple 
folding pathways [26j . 

A study of the validity of the 0-value analysis as a 
tool for identifying critical (i.e., nucleating) residues in 
protein folding and providing a structural characteriza- 
tion of the TS was recently presented in [i^ . This study 
involved measuring the folding time (i.e., the inverse of 
the folding rate) of all possible single point mutations and 
many double point mutations in two model proteins with 
different native geometries. Here we propose to explore 
the link between conformational plasticity and a gross ge- 
ometric trait of the native structure, the relative number 
of local and non-local native contacts, by using as start- 
ing point the mutational data reported in [43 |. For each 
native geometry, we selected a number of mutations with 
different kinetic effects. We investigate the relation be- 
tween folding plasticity and native geometry by studying 
the impact of these mutations on the folding 'reaction' 
from the point of view of the conformational changes it 
encompasses. This is done by monitoring for each mutant 
the degree of nativeness of each residue along the folding 
process and comparing it with the wild type protein. Our 
analysis shows that folding to the native geometry that 
is rich in nonlocal long-range (LR) contacts is consider- 
ably less plastic (i.e., more conformationally robust) than 
folding to the model protein that has predominantly local 
contacts, and that this is possibly a direct consequence of 
the more cooperative folding transition exhibited by the 
non-local geometry. This reinforces the results of [431 1 
cording to which the picture of the TS emerging from the 
(/)-value analysis is more reliable when applied to target 
proteins having a distinctively large number of non-local 



native contacts. Therefore, if 0-value mutational data 
is to be interpreted in the traditional way, which is so 
far the most commonly adopted view, it is important to 
select native folds where the number of LR native con- 
tacts is sharply dominant; they show a higher robustness 
against mutation and, at the coarse level of contact clus- 
ter, they tend to fold in a Levinthal-likc manner, i.e., as 
single route folding proteins. 

This article is organized in the following manner. After 
the introductory section, the protein models and simula- 
tion methodologies are described. Then, we present and 
discuss the results from simulations. Finally, in the last 
section we draw some concluding remarks. 



II. MODELS AND METHODS 
A. The Go model and simulation details 

We consider a simple three-dimensional lattice model 
of a protein molecule with chain length iV=48. In such 
a minimalist model amino acids, represented by beads of 
uniform size, occupy the lattice vertices and the peptide 
bond, which covalcntly connects amino acids along the 
polypeptide chain, is represented by sticks with uniform 
(unit) length corresponding to the lattice spacing. 

To mimic protein energetics we use the Go model [27| . 
In the Go model the energy of a conformation, defined by 
the set of bead coordinates {fl}, is given by the contact 
Hamiltonian 

N 

H{{n}) = Y,e Airier-), (1) 

i>j 

where the contact function A(r^ — fj), is unity only if 
beads i and j form a non-covalent native contact, i.e., 
a contact between a pair of beads that is present in the 
native structure, and is zero otherwise. The Go potential 
is based on the idea that the native fold is very well opti- 
mized energetically. Accordingly, it ascribes equal stabi- 
lizing energies (e.g., e = — 1.0) to all the native contacts 
and neutral energies (e = 0) to all non-native contacts. 
The motivation to use the Go model is based on the well 
accepted finding that for small, single domain two-state 
proteins, the geometry of the native fold is the major 
determinant of folding kinetics [1, 0, Q . 

In order to mimic the protein's relaxation towards the 
native state we use a standard Monte Garlo (MG) algo- 
rithm together with the kink-jump move set [28| . Accord- 
ingly, local random displacements of one or two beads 
(at the same time) are repeatedly accepted or rejected in 
accordance with the standard Metropolis MG rule [29| . 
A MG simulation starts from a randomly generated un- 
folded conformation and the folding dynamics is moni- 
tored by following the evolution of the fraction of native 
contacts, Q = q/L, where L is number of contacts in 
the native fold and q is the number of native contacts 
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FIG. 1: Three dimensional representation of geometry 1 and 2 (top row) and corresponding contact maps 
(bottom row). Each square in the contact map represents a native contact. For structures that like ours are maximally 
compact cuboids with A'^ = 48 residues there are 57 native contacts. A non-local contact between two residues i and j is defined 
as LR if their sequence separation is at least 12 units, i.e., \i — j\ > 12 Accordingly, the number of LR (white squares) 
contacts in geometry 1 is 19 and in geometry 2 is 42. 



formed at each MC step. The number of MC steps re- 
quired to fold to the native state (i.e., to achieve Q = 1.0) 
is the first passage time (FPT) and the folding time, t, is 
computed as the mean FPT of 100 simulations. Except 
otherwise stated folding is studied at the so-called opti- 
mal folding temperature, the temperature that minimizes 
the folding time [s^, [Sll, [H, [1^ . The folding transition 
temperature, T/, is defined is the temperature at which 
denatured states and the native state are equally popu- 
lated at equilibrium. In the context of a lattice model it 
can be defined as the temperature at which the average 
value < Q > of the fraction of native contacts is equal to 
0.5 [131. In order to determine T/ we averaged Q, after 
collapse to the native state, over MC simulations lasting 
~ 10^ MCS. 



B. Target geometries 



Two native folds, which arc amongst the 'simplest' 
(geometry 1) and the most 'complex' (geometry 2) 
cuboid geometries found through lattice simulations of 
homopolymer relaxation, were considered in this study 
(Figure [T|). A non-local contact is considered long-range 
(LR) if the two beads participating in the contact are sep- 
arated by more than 12 units of backbone distance 0]. 
Accordingly, 33% of the native contacts in geometry 1 
arc LR, while geometry 2 has 74% LR contacts. 
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III. RESULTS 
A. A picture of the folding reaction 

The mechanistic equivalent of the experimental i^- value 
of residue i at time t, (t>™'^'^{t) , is given by the num- 
ber of native contacts qlit), the residue establishes in 
conformation F, normalised to the number of contacts 
it estabhshes in the native fold, q^"-^™'^ ^ i.e., (j)^'^'^ = 
^r^^y^native ^ ^ a residuc is fully native if 

^mec _ ^ Thus, 0™'^'^ provides a direct measure of struc- 
ture formation. 

In what follows wc use (p™^'^ to obtain a picture of the 
folding reactions leading to geometry 1 and geometry 2, 
and we use the fraction of native contacts as a progress 
variable. In particular, for fraction of native contacts, Q, 
we compute the number of times each residue is fully na- 
tive (i.e., 0™'^'^ = 1) and normalize it to the total number 
of times Q is counted in the course of a folding event. 
The probability thus computed is then averaged over an 
ensemble of 100 independent folding simulations. We re- 
mark that although the probability to fold, Pfoid, is the 
most accurate progress variable in lattice simulations of 
protein folding [38l| , its use for the purposes of the present 
goal is computationally prohibitive as it would have to 
be evaluated for every conformation sampled in a folding 
event. This is why we use instead the fraction of native 
contacts, which was recently shown to measure correctly 
the degree of closeness to the native fold d^j for proteins 
with smooth energy landscapes (i.e., single exponential 
kinetics) [H, Hlj, like those considered here. 

Not surprisingly, the folding pattern of geometry 1 
(Figure[2l top left), i.e., the conformational changes lead- 
ing to the native fold, is readily distinct from that exhib- 
ited by geometry 2 (Figure [31 top left). For example, 
from early to mid folding (i.e., 0.26 < Q < 0.53), there 
are more residues in geometry 2 with a higher probability 
of being in their native environment. These probabilities 
decrease sharply immediately prior to collapse into the 
native fold (i.e., 0.79 < Q < 1.0). On the other hand, 
for the vast majority of the residues belonging to geome- 
try 1, the probability of being fully native increases in a 
rather continuous way from early to late folding. 

B. Native geometry and conformational plasticity 

Within the context of the Go model a single-point mu- 
tation is equivalent to replace the set of native contacts 
established by the mutated residue with neutral contacts, 
i.e., contacts to which zero energy is ascribed. In a re- 
cent study we have mutated every single residue in each 
protein model and measured the change in the folding 
time of the mutant relative to the wild-type (WT) se- 
quence (Figure HI; an extensive number of double point 
mutations was also performed [4^ . We now ask how does 
the folding pattern of both protein models previously de- 
scribed alters upon mutation? To answer this question 



we have selected single point mutations that produce dif- 
ferent kinetic effects. For both geometries we have inves- 
tigated the effect on conformational plasticity of the most 
deleterious and of the milder forms of single point mu- 
tations. We have also considered the impact on the con- 
formational changes occurring during folding of the most 
deleterious double point mutations. Thus, for geometry 
1 we have looked into the folding patterns obtained upon 
mutating residue 29 (which leads to the largest folding 
time increase of 200%), residues 5 and 36 (neutral mu- 
tations located below and above the sequence midpoint 
respectively, and also residue 8 (which decreases the WT 
protein folding time by 16.2%) (Figure [4]). A simulta- 
neous mutation on the residues 20 and 30, which is the 
most deleterious double point mutation for geometry 1, 
increases the folding time of the WT protein by almost 
two orders of magnitude We have also investigated 
its impact on conformational plasticity. The pattern of 
the folding reaction associated with any one of these mu- 
tations is sharply different from that exhibited by the 
WT protein (Figure [2), which is indicative that the less 
complex geometry has access to several folding pathways. 
For the more complex geometry 2, one has selected the 
mutation on residue 36 (which produces the largest in- 
crease in folding time of 600%), on residue 24 (which is 
the only mutation for which a vanishingly small change of 
1.2% in the folding time is observed), and also on residue 
17 (which leads to a mild increase of 27% the WT pro- 
tein's folding time). We have also investigated the ef- 
fects of two double point mutations, namely on residues 
7 and 34, and on residues 35 and 36. These mutations 
increase the WT protein's folding time by more than two 
orders of magnitude [i^l- A scenario, considerably dif- 
ferent from that reported for geometry 1, is observed for 
geometry 2 (Figure [3]). Indeed, in this case, the folding 
pattern is considerably more robust, which suggests that 
the search for the native fold is constrained to follow a 
fixed sequence of conformational changes, i.e., that this 
geometry has a smaller conformational plasticity. 



C. Long-range contacts, the structure of the 
transition state and the cooperativity of the folding 
transition 

As discussed below, the large number of LR contacts 
in geometry 2 protects folding from perturbation, which 
decreases the conformational plasticity, in two ways: i) 
by increasing the robustness of the transition state' struc- 
ture against mutation and ii) by increasing the coopera- 
tivity of the folding transition. 

Figure [S] (straight line) reports the probability that 
each protein residue is fully native in the TS {Q ~ 0.5) 
of geometry 1 (left) and geometry 2 (right), and how this 
probability changes upon mutation. The structure of the 
TS of geometry 2 is clearly more robust against mutation 
than that of geometry 1. Indeed, most mutations in ge- 
ometry 1 lead to a large structural consolidation between 
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FIG. 2: Folding patterns of geometry 1: probability that each residue is in its native environment along the reaction coordinate 
fraction of native contacts, Q, for the wild type (top left) and mutated sequences in geometry 1. 



residues 12 and 33. In geometry 2, the most deleterious 
double-point mutations disrupt networks of non-local LR 
contacts that form the core structure of the TS 4^1 . Since 
the formation of LR contacts greatly restricts the number 
of conformations available to the folding chain, the estab- 
lishment of LR contacts is generally entropically costly. 
Because the LR contacts are clearly dominant in the na- 
tive fold of geometry 2 it is highly unlikely that upon the 
mutation of one or more of its residues, folding can occur 
through an energetically competitive TS, as this would 
imply changing the TS structure by increasing substan- 
tially its content in local (i.e., less entropically costly) 
native contacts [3, HE]- Thus, for the non-local geome- 



try 2, the robustness of its TS is a direct consequence of 
the fold's high content in LR contacts. 

In protein folding the formation (and breaking) of na- 
tive contacts in a non- independent manner (i.e., coop- 
eratively) results into the depletion of partially folded 
conformational states, which translates into a kinetic be- 
haviour that fits remarkably well a two-state model. In- 
deed, at the transition midpoint of a two-state folding 
'reaction', half of the protein molecules in the test tube 
are folded and half of them are coil. Thus, microscopi- 
cally, a cooperative folding transition is characterized by 
a bimodal distribution of protein molecules over energy, 
fraction of native contacts, Q, or any other observable pa- 
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FIG. 3: Folding patterns of geometry 2: probability that each residue is in its native environment along the reaction coordinate 
fraction of native contacts, Q, for the wild type (top left) and mutated sequences in geometry 2. 



rameter [21|- Data reported in Figure [5] (top) shows that 
at the transition temperature, Tm, the population distri- 
bution of the fraction of native contacts for geometry 2, 
P{Q), is more strongly bimodal than that of geometry 
1, showing that for geometry 2 the folding transition is 
clearly more cooperative [34| . The stronger cooperative 
stabilization of the native state of geometry 2 relative 
to that of geometry 1 is more evident from the corre- 
sponding free energy curve, F{Q) = —kThi P{Q), where 
a very high free energy maximum separates the native 
conformation from the of ensemble of unfolded conform- 
ers (Figure [51 bottom) [23| ■ Mechanistically, the strong 
cooperative folding of geometry 2 restricts the number of 



allowed conformational changes, and as a consequence, 
the number of alternative folding trajectories (i.e., the 
conformational plasticity) is smaller for this more com- 
plex geometry. 



IV. CONCLUSION 

According to the statistical or landscape view of pro- 
tein folding the native state can be reached from a myriad 
of microscopic parallel pathways. This 'new' view con- 
trasts the so-called traditional view that envisages fold- 
ing as a Levinthal-likc search, where the elements of the 
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FIG. 4: Change in folding time relative to the wild-type pro- 
tein resulting from single point mutations [i^ . 



native structure assemble in a well-defined order. Thus, 
identifying the number of folding pathways accessible to 
a protein chain is a crucial question in molecular biol- 
ogy. Recent molecular dynamics simulations of the un- 
folding of CI2 at high temperature showed that the ob- 
servation of sequential or multiple unfolding pathways 
depends on the 'resolution', i.e. level of structural detail, 
at which one observes the unfolding process. Indeed, at 
the level of individual contact formation the unfolding 
of CI2 happens through highly parallel folding pathways 
while at the coarser level of contact clusters sequential 
folding events emerge Experimentally, however, the 
observation of multiple folding pathways has proven a 
much more challenging task, and indirect evidence for 
alternative routes to the native state has been reported 
for engineered proteins only and the term folding or con- 
formational plasticity was coined to denote the multi- 
plicity of folding pathways identified for the perturbed 
(i.e.; mutated) system. Folding plasticity for small per- 
turbations (i.e.; mild mutations) is taken as an indication 
that multiple folding routes exist for the wild-type pro- 
tein. Here we have investigated the effect of single- and 
double-point mutations on the robustness of the folding 
reaction (and TS structure) leading to two distinct pro- 
tein geometries. More specifically, we have investigated 
how the probability that each residue is fully native (i.e., 
that is has all its native contacts formed) evolves during 
the folding process, and how this evolution responds to 
mutation. Our findings suggest that folding to native ge- 
ometries which are distinctively rich in long-range native 
contacts is more robust against mutation than folding 
to native geometries where the number of local native 
contacts is dominant. Indeed, 'local' geometries seem to 
be able to find alternative folding routes in response to 
mutation while non-local geometries appear to behave 
in a Lcvintahl-like manner, their folding reaction being 



constrained to follow a fixed sequence of conformational 
changes, when monitored at the coarse level of contact 
cluster (i.e., the set of native contacts established by each 
residue). In other words, the conformational plasticity is 
expected to be small for target proteins where the num- 
ber of non-local native contacts is sharply larger than the 
number of local contacts. These findings are supported 
by experimental results reported for the Ag-ss-repressor 
(an a-protein that is rich local contacts) and for protein 
CI2, and src SH3 domain {a/j3 proteins rich in non-local 
contacts); while the TSs of the latter were found to be 
largely invariant against single and double point muta- 
tions [3, [13 , substantial changes were found for the for- 
mer. Interestingly, the more complex native fold, i.e., the 
one that is rich is non-local native contacts is clearly less 
symmetric than the more local geometry, suggesting that 
a relation similar to that found by Klimov and Thiru- 
malai between the symmetry of the tertiary structure 
and the folding plasticity [2y| may also hold for lattice 
proteins. 

Mechanistically, the lower folding plasticity of the more 
complex native geometries is most possibly a direct con- 
sequence of their more cooperative folding transition, 
which results from a large content in long-range native 
contacts. In agreement with this explanation, experi- 
mental evidence for high conformational plasticity, re- 
sulting from a weak cooperative behaviour, was recently 
reported. Indeed, a change in the folding mechanism 
was found for two members of the cks family as a result 
of weakening the cooperativity of the core of the pro- 
tein [H. 

In our previous work 



42| we have concluded that na- 



tive folds having a distinctively large number of non-local 
native contacts are more suitable targets for TS studies 
based on the use of 0-value analysis. The results re- 
ported here strengthen this conclusion; indeed, due to 
their smaller folding plasticity, model proteins with a dis- 
tinctively large number of LR contacts are preferable tar- 
gets for TS studies based on the traditional interpretation 
of 0- values. 
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FIG. 5: Probability that a residue is in its native environment in the wild-type transition state (TS) of geometry 1 (left) and 
geometry 2 (right). Also shown is the TS change induced by single and double-point mutations. 
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FIG. 6: Cooperativity of the folding transition. Population histogram for the frequency of occurrence of conformations with 
fraction of native contacts Q, P(Q) (left), and the free energy computed as F{Q) — —kTln P{Q) (right). To compute P{Q) 
data were averaged after collapse to the native state in long MC simulations lasting over ^ 10® MCS. 
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